ARIMAX/SARIMAX Model for Sector market and macroeconomic factors as exogenous variables
The ARIMAX/SARIMA model is a powerful tool for analyzing the relationships between sector markets, macroeconomic factors, and other exogenous variables. By incorporating both endogenous and exogenous variables, the ARIMAX/SARIMA model can provide a more accurate and comprehensive analysis of the stock market.
In the case of sector markets, the ARIMAX/SARIMAX model can be used to identify important relationships between different sectors and their performance. For example, if the technology sector is performing well, we might expect to see a corresponding increase in the performance of other sectors that rely on technology.
By including macroeconomic factors as exogenous variables, the ARIMAX/SARIMAX model can also help to identify how changes in the broader economy might impact the performance of different sectors. For example, if interest rates are expected to rise, we might expect to see a decrease in the performance of sectors that are particularly sensitive to interest rates, such as real estate or financial sectors.
According to the findings, the endogenous and exogenous variables in the time series data are not interdependent, then the ARIMAX model can be a good choice for predicting the stock market indices. If there is seasonality in the data, then the SARIMAX model can be used to account for this seasonal variation. If there is no seasonality, then the simpler ARIMA model can be used instead of SARIMAX.
Let’s examine the relationship between endogenous and exogenous variables before proceeding with the ARIMAX/SARIMAX model.
Code
ts_plot(sector_factor_data,
title = "Sector Market and Macroeconomic Variables",
Ytitle = "Values",
Xtitle = "Year")Code
numeric_vars_sector_factor_data <- c("XLB.Adjusted", "XLC.Adjusted", "XLE.Adjusted","XLF.Adjusted","XLI.Adjusted","XLP.Adjusted","XLK.Adjusted","XLRE.Adjusted","XLU.Adjusted","XLV.Adjusted","XLY.Adjusted","gdp", "interest", "inflation", "unemployment")
numeric_sector_factor_data <- sector_factor_data[, numeric_vars_sector_factor_data]
normalized_sector_factor_data_numeric <- scale(numeric_sector_factor_data)
normalized_sector_factor_data <- ts(normalized_sector_factor_data_numeric, start = c(2010, 1), end = c(2021,10),frequency = 4)
ts_plot(normalized_sector_factor_data,
title = "Normalized Time Series Data for Sector Market and Macroeconomic Variables",
Ytitle = "Normalized Values",
Xtitle = "Year")Plotting the raw time series data for these variables can be challenging due to differences in scale and units of measurement. Therefore, normalizing the data can provide a clearer picture of the relationships and patterns in the data.
In the Normalized Time Series Data for Sector Market and Macroeconomic Factors plot, the variables have been scaled using a common scaling technique such as z-scores or percentage changes. This process allows for a fair comparison between variables and eliminates the impact of differing scales or units of measurement.
Normalizing the data is essential when analyzing the relationship between sector market data and macroeconomic factors. It can help remove any bias or distortion introduced by variables with different units or magnitudes and can stabilize the estimation of models. Additionally, normalizing the data can improve the interpretability of model coefficients, making it easier to assess their relative importance. By normalizing the time series data, it is possible to gain insights into the relationships between sector market data and macroeconomic factors and to make informed investment decisions based on these insights.
Cross-Correlation for the Variables and Selection of Feature Variables
Cross-correlation is a statistical technique used to measure the relationship between two or more variables in a time series. In the context of ARIMAX modeling, cross-correlation is often used for feature selection. For selecting feature variables in our analysis, we will first examine the correlation through a heatmap among all the variables, and then analyze the autocorrelation function (ACF) plots between the response variable and the exogenous variables.
Correlation Heatmap
Code
# Get upper triangle of the correlation matrix
get_upper_tri <- function(cormat){
cormat[lower.tri(cormat)]<- NA
return(cormat)
}
cormat <- round(cor(normalized_sector_factor_data_numeric),2)
upper_tri <- get_upper_tri(cormat)
melted_cormat <- melt(upper_tri, na.rm = TRUE)
# Create a ggheatmap
ggheatmap <- ggplot(melted_cormat, aes(Var2, Var1, fill = value))+
geom_tile(color = "white")+
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation") +
theme_minimal()+ # minimal theme
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 12, hjust = 1))+
coord_fixed()
ggheatmap +
geom_text(aes(Var2, Var1, label = value), color = "black", size = 4) +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.major = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
legend.justification = c(1, 0),
legend.position = c(0.6, 0.7),
legend.direction = "horizontal")+
guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
title.position = "top", title.hjust = 0.5))The heatmap reveal important insights into the relationships between the sector market and various economic indicators. The strong positive correlations between the sector market and inflation, along with the negative correlations with unemployment rate, and gdp growth rate and sector market have positive correlation suggest that these variables may play a significant role in influencing sector market movements. In contrast, the weaker correlations between the sector market interest rates indicate that these variables may have less impact on sector market fluctuations except for the enegry sector. These findings provide valuable guidance for selecting relevant variables in the VAR model to better understand and forecast stock market dynamics.